Assembly and Comparative Analysis of the Complete Mitochondrial Genome of Saussurea inversa (Asteraceae)

Saussurea inversa is a perennial herb used in traditional Chinese medicine and is effective against rheumatoid arthritis. In this study, we sequenced the complete mitochondrial (mt) genome of S. inversa (GenBank accession number: ON584565.1). The circular mt genome of S. inversa was 335,372 bp in length, containing 62 genes, including 33 mRNAs, 22 tRNAs, 6 rRNAs, and 1 pseudogene, along with 1626 open reading frames. The GC content was 45.14%. Predictive analysis revealed substantial RNA editing, with ccmFn being the most abundantly edited gene, showing 36 sites. Gene migration between the mt and chloroplast (cp) genomes of S. inversa was observed through the detection of homologous gene fragments. Phylogenetic analysis revealed that S. inversa was clustered with Arctium tomentosum (Asteraceae). Our findings provide extensive information regarding the mt genome of S. inversa and help lay the foundation for future studies on its genetic variations, phylogeny, and breeding via the analysis of the mt genome.


Introduction
S. inversa commonly known as "Snow Lotus" in Tibetan medicine, is a perennial herb from the Asteraceae family [1], mainly found on the Qinghai-Tibetan Plateau.It grows in alpine limestone flats at altitudes of 4700-5400 m [2] and is a typical alpine plant well-adapted to the extreme environment [3].It is effective against rheumatoid arthritis [4] and menstrual disorders when soaked in wine and then consumed; however, its specific mechanism of action needs further study.There is limited literature on S. inversa, particularly in genomics research [5].In plants, mitochondrial genomes exhibit unique evolutionary patterns, they have a high rearrangement but a low mutation rate, as well as a large size [6].They originate from a bacterial ancestor and maintain their own genome, which is expressed by designated mitochondrial transcription and translation machinery.This machinery differs from that used for nuclear gene expression, as the mitochondrial protein synthesis machinery is structurally and functionally very different from that governing eukaryotic cytosolic translation [7].This study aims to analyze the complete mt genome of S. inversa to provide a scientific basis for understanding the adaptability of alpine plants to extreme environments and to help lay the foundation for future studies on the genetic variations, phylogeny, and breeding of S. inversa.It also provides theoretical support for the conservation and utilization of medicinal economic plants.

Materials and Methods
In this study, fresh leaves of S. inversa were collected from Daban Mountain (101 • 40 ′ 19 ′′ E, 37 • 35 ′ 21 ′′ N), a branch of the Qilian Mountains in the northeast of the Qinghai-Tibetan Plateau at an altitude of 4000 m.A specimen was deposited at the Plateau Plant Laboratory, College of Eco-Environmental Engineering, Qinghai University, Xining, China (Wubin Dai, daiwubin2024@163.com),under the voucher number DWB-2022-SH0615 (Figure 1).

Materials and Methods
In this study, fresh leaves of S. inversa were collected from Daban Mountain (101°40′19″ E, 37°35′21″ N), a branch of the Qilian Mountains in the northeast of the Qinghai-Tibetan Plateau at an altitude of 4000 m.A specimen was deposited at the Plateau Plant Laboratory, College of Eco-Environmental Engineering, Qinghai University, Xining, China (Wubin Dai, daiwubin2024@163.com),under the voucher number DWB-2022-SH0615 (Figure 1).The leaves were washed, frozen in liquid nitrogen, and ground to extract total DNA using the CTAB method.Second-generation sequencing, including sample quality detection, library construction, library quality detection, and the library sequencing process, was performed using the Illumina Novaseq 6000 platform (Illumina, Shanghai, China) according to the standard protocol provided by the manufacturer.The genomic DNA was randomly interrupted, and then the large fragments were enriched and purified using magnetic beads.Next, the large fragments were cut and recovered, and the fragmented DNA was repaired.After library construction, a certain concentration and volume of the DNA library was added to the flow cell, and the flow cell was transferred to the Oxford The leaves were washed, frozen in liquid nitrogen, and ground to extract total DNA using the CTAB method.Second-generation sequencing, including sample quality detection, library construction, library quality detection, and the library sequencing process, was performed using the Illumina Novaseq 6000 platform (Illumina, Shanghai, China) according to the standard protocol provided by the manufacturer.The genomic DNA was randomly interrupted, and then the large fragments were enriched and purified using magnetic beads.Next, the large fragments were cut and recovered, and the fragmented DNA was repaired.After library construction, a certain concentration and volume of the DNA library was added to the flow cell, and the flow cell was transferred to the Oxford Nanopore PromethION sequencer for real-time single-molecule sequencing.The number of reads was 918,028, and the mean read length was 10,171.The resulting readable raw data were uploaded to GenBank, BioProject, and BioSample, with the SRA numbers PRJNA863745, SAMN30061603, and SRR20747203, respectively.
The sequencing data were assembled using Canu (https://canu.readthedocs.io/,accessed on 15 July 2022) following quality control.Annotations were performed for the mt genome using ORFF (www.ncbi.nlm.nih.gov/orffinder,accessed on 15 July 2022), for tRNA using tRNAscanSE [8], and for the encoding protein and rRNA using BLAST (blast.ncbi.nlm.nih.gov).RNA editing sites were predicted using PREP Suite [9].An mt genome map of S. inversa and the structures of the genes that were difficult to annotate was prepared and analyzed using OGVIEW [10] (http://www.1kmpg.cn/pmgmap,accessed on 15 July 2022).
Conserved CDSs (coding sequences) of eight Asteraceae, two Rosaceae, and one Leguminosae (outgroup) were selected to draw a maximum likelihood evolutionary tree.These sequences were compared with multiple sequences using MAFFT software (v7.427,Auto mode), and the compared sequences were joined head to tail and then trimmed with trimAl (v1.4.rev15).The model prediction was carried out using Jmodeltest-2.1.10software after trimming to confirm that the model was of the GTR type.The bootstrap test was performed 1000 times, and support values, shown as percentages, are shown at the nodes [11].
Prediction of RNA editing sites.RNA editing is a generic term comprising a variety of processes that alter the DNA-encoded sequence of a transcribed RNA by inserting, deleting, or modifying nucleotides in the transcripts [9].For instance, restoration of the start codon of the atp6 gene from ACG to ATG in Table 1 requires RNA editing.There are large amounts of RNA editing for the genes listed above, and the number of RNA edits for each gene is shown in Figure 3.The codon position, change of amino acids, and type of RNA editing are listed in Supplementary Table S1.The highest number of RNA edits was for ccmFn, with 36, and the lowest was for atp6, with only one.After RNA editing, 44.67% of the amino acids retained their hydrophobicity, while 8.05% of hydrophobic amino acids became hydrophilic, and 47.28% of hydrophilic amino acids became hydrophobic (Table 2).Phylogenetic analysis.To understand the process of evolution of the S. inversa mt genome, we conducted a phylogenetic analysis of the S. inversa mt genome and the published mt genomes of 12 plants.The clustering in the phylogenetic tree is consistent with the relationships of these species at the family and genus levels, indicating that the mt genome-based clustering results are reliable.The phylogenetic tree (Figure 4) revealed that S. inversa is closely related to A. tomentosum (Asteraceae).The reliability of the support values of each node was above 88%, indicating that the evolutionary tree accurately reflects the genetic distances between the listed species.Phylogenetic analysis.To understand the process of evolution of the S. inversa mt genome, we conducted a phylogenetic analysis of the S. inversa mt genome and the published mt genomes of 12 plants.The clustering in the phylogenetic tree is consistent with the relationships of these species at the family and genus levels, indicating that the mt genome-based clustering results are reliable.The phylogenetic tree (Figure 4) revealed that S. inversa is closely related to A. tomentosum (Asteraceae).The reliability of the support values of each node was above 88%, indicating that the evolutionary tree accurately reflects the genetic distances between the listed species.Phylogenetic analysis.To understand the process of evolution of the S. inversa mt genome, we conducted a phylogenetic analysis of the S. inversa mt genome and the published mt genomes of 12 plants.The clustering in the phylogenetic tree is consistent with the relationships of these species at the family and genus levels, indicating that the mt genome-based clustering results are reliable.The phylogenetic tree (Figure 4) revealed that S. inversa is closely related to A. tomentosum (Asteraceae).The reliability of the support values of each node was above 88%, indicating that the evolutionary tree accurately reflects the genetic distances between the listed species.Repeat sequence analysis.Interspersed repeat sequences are repetitive sequences that are scattered in the genome.In the S. inversa mt genome, we identified a total of 206 interspersed repeats with a length greater than or equal to 30 bp; of these, 112 were forward repeats and 88 were palindrome repeats.The length of the longest forward repeat sequence was 16,421 bp, and that of the longest palindrome repeat sequence was 518 bp.The distribution of the lengths of the forward and palindrome repeats is shown in Figure 5.The abundance of both types of repeats was the highest when the repeats were in the range of 30-39 bp.
Repeat sequence analysis.Interspersed repeat sequences are repetitive sequence that are scattered in the genome.In the S. inversa mt genome, we identified a total of 20 interspersed repeats with a length greater than or equal to 30 bp; of these, 112 were fo ward repeats and 88 were palindrome repeats.The length of the longest forward repea sequence was 16,421 bp, and that of the longest palindrome repeat sequence was 518 bp The distribution of the lengths of the forward and palindrome repeats is shown in Figur 5.The abundance of both types of repeats was the highest when the repeats were in th range of 30-39 bp.Analysis of the codon composition.Due to codon concatenation, each amino aci corresponds to a minimum of one codon and a maximum of six codons.There is grea variation in the rate of codon usage in the genomes of different species and organism This inequality in synonymous codon usage is called relative synonymous codon usag (RSCU).This preference is believed to be the combined result of natural selection, specie mutation, and genetic drift [16].We used a self-coded Perl script to analyze the codo composition of the S. inversa mt genome.The results are shown in Table 3.The number o codons in all coding genes was 9881, and the relative RSCU in the S. inversa mt genome shown in Figure 6.There were 31 codons with an RSCU > 1, indicating that the usag Analysis of the codon composition.Due to codon concatenation, each amino acid corresponds to a minimum of one codon and a maximum of six codons.There is great variation in the rate of codon usage in the genomes of different species and organisms.This inequality in synonymous codon usage is called relative synonymous codon usage (RSCU).This preference is believed to be the combined result of natural selection, species mutation, and genetic drift [16].We used a self-coded Perl script to analyze the codon composition of the S. inversa mt genome.The results are shown in Table 3.The number of codons in all coding genes was 9881, and the relative RSCU in the S. inversa mt genome is shown in Figure 6.There were 31 codons with an RSCU > 1, indicating that the usage frequency of these codons is greater than that of other synonymous codons.Among these, 28 codons ending with the A/T base were identified, accounting for 90.32% of the codons, indicating that frequently used codons tend to end with the A/T base.frequency of these codons is greater than that of other synonymous codons.Among these, 28 codons ending with the A/T base were identified, accounting for 90.32% of the codons, indicating that frequently used codons tend to end with the A/T base.RSCU values are the number of times a particular codon was observed relative to the number of times that codon would be expected for uniform synonymous codon usage.
Analysis of homologous fragments of mitochondria and chloroplasts.Using the BLAST tool, we screened the fragments of the S. inversa mt and cp genomes [5] exhibiting > 70% similarity and performed homologous fragment analysis (Figure 7).We screened out 15 homologous fragments with a total length of 9078 bp, which accounted for 2.70% of the mt genome (Table 3).These homologous fragments contained 14 annotated genes, of which six were tRNA genes, namely tRNA-CCA, tRNA-UGG, tRNA-GUC, tRNA-GUG, tRNA-GUU, and tRNA-CAU, while the others were rrn16, psaB, psaA, ycf2, petL, petG, rbcL, and rps14.Analysis of homologous fragments of mitochondria and chloroplasts.Using the BLAST tool, we screened the fragments of the S. inversa mt and cp genomes [5] exhibiting > 70% similarity and performed homologous fragment analysis (Figure 7).We screened out 15 homologous fragments with a total length of 9078 bp, which accounted for 2.70% of the mt genome (Table 3).These homologous fragments contained 14 annotated genes, of which six were tRNA genes, namely tRNA-CCA, tRNA-UGG, tRNA-GUC, tRNA-GUG, tRNA-GUU, and tRNA-CAU, while the others were rrn16, psaB, psaA, ycf2, petL, petG, rbcL, and rps14.

Discussion
Mitochondria have a relatively independent genetic transcription system [17].However, the mt genome contains not only its genes but also some genes from the cp genome,

Discussion
Mitochondria have a relatively independent genetic transcription system [17].However, the mt genome contains not only its genes but also some genes from the cp genome, and they have multiple types of repetitive sequences and relatively conserved coding sequences.This suggests that these genes may have migrated during evolution, and further investigation is needed to understand the specific mechanism.Rapid advances in genome sequencing technology have accelerated the study of the mt genome.This study described the basic features of the mt genome of S. inversa for the first time, which provides an important foundation for understanding the function, inheritance, and evolutionary trajectory of the mt genome.The mt genome of S. inversa is a circular sequence with a length of 335,372 bp and a GC content of 45.14%.Plants exhibit large mitochondrial genomes of 66 Kb to 11.3 Mb, with large intergenic repetitions prone to recombination [18].The size of the mitochondrial genome varies considerably; despite this, the GC content is relatively conservative.For example, the GC content was found to be 45.08% for watermelon (Citrullus lanatus [Thunb.]Matsum.Et Nakai) and 44.77% for melon (Cucumis melo L.) [19].which is similar to that of S. inversa.We performed BLAST analysis of the mt sequence and annotated the sequence using software, and we found 33 protein-coding genes, 22 tRNA genes, 4 rRNA genes, and 1626 ORFs in the mt genome.Since sequence duplications can lead to intermolecular recombination in mitochondria, it is particularly important to perform repetitive sequence analysis, and we analyzed the dispersed repetitive sequences of the mt genome of S. inversa.The results showed a total of 206 dispersed repeat sequences with lengths greater than or equal to 30 bp.The length of the longest forward repeat sequence was 16,421 bp, and that of the longest palindrome repeat sequence was 518 bp.
RNA editing is a post-transcriptional modification process that alters the RNA sequence relative to the genomic blueprint.In plant mitochondria, the most common type is C-to-U, and the absence of C-to-U RNA editing results in abnormal plant development, such as etiolation and albino leaves, aborted embryonic development, and retarded seedling growth [20].In this study, 497 RNA editing sites were identified in 34 coding genes of the S. inversa genome, with 29 codon shift types.Among the codon transfer types, TCA→TTA was the most common, which is consistent with the findings of Qiao et al. [21], who noted 69 editing sites.After RNA editing, 8.12% of hydrophobic amino acids became hydrophilic, and 47.28% of hydrophilic amino acids became hydrophobic.Consistent results exist for the Bupleurum chinense DC. mt genome, where the most abundant transfer type in the plant was TCA→TTA, totaling 78, which was edited to change the hydrophobicity of more than half of the amino acids [21].The selection of editing sites in the genome of S. inversa was strongly biased, with all editing sites being C-to-U edited, which is the most common type of editing in plant mt genomes [20].It was shown that RNA editing occurring at the second position of the codon accounted for more than half of the total number of edits [22].In the S. inversa mt genome, 78.87% of the editing sites were also located at the second base of the triplet codon, which is consistent with previous findings.In addition, after RNA editing, the encoded amino acids were partially converted into termination codons [22].In the mt genome of S. inversa, 0.80% of the amino acids were edited into termination codons, resulting in an early stop of the coding process, which altered the gene's function.
Plant DNA transfer between organellar and nuclear genomes and between species occurs frequently, and sequencing analyses have identified many DNA transfer events between different plant genomes (mitochondrial, nuclear, and chloroplast).It was found that DNA transfer events primarily involve the transfer of DNA fragments from the organelle genome to the nuclear genome, followed by the transfer of nuclear and plastid genomes to the mt genome [19].In the present study, we found that the total length of homologous fragments transferred from the cp genome to the mt genome of S. inversa was 9078 bp, and these homologous fragments contained eight annotated genes, six of which were tRNA genes.This result is similar to that of Ma et al. [23], who found that the fragment of the Acer truncatum cp genome transferred to the mt genome contained six integrative genes, five of which were tRNA genes.The percentage of transferred fragments in the mt genome of S. inversa was 2.7%, which is similar to the data previously reported for A. truncatum (2.36%) [23] and Salix suchowensis (2.8%) [24], but lower than that of Suaeda glauca (5.18%) [25].In melon mitochondria, 48.62% of the sequence genome was homologous to the nuclear genome, and 1048 fragments in the mitochondria corresponded to 3391 fragments on the nuclear genome.The 1048 homologous sequences in the mitochondrial genome ranged from 214 to 6120 bp, with an average length of 941 bp [19].However, the nuclear genes of S. inversa are not yet publicly available, so subsequent analyses will be carried out at a later time.
We analyzed the codons in the S. inversa mt genome, and the RSCU value reflects the ratio of the actual frequency of use of a codon to the theoretical frequency of use in the absence of usage bias; if RSCU = 1, it means that there is no bias in the use of the codon, and if RSCU < 1, it means that the actual frequency of use of the codon is lower than that of the other synonymous codons.Conversely, if RSCU > 1, it means that the actual frequency of use of the codon is higher than the usage frequency of other synonymous codons [26].The results of the analysis showed that there were 31 codons with an RSCU > 1, indicating that the usage frequency of these codons is greater than that of other synonymous codons.Among these, 28 codons ending with the A/T base were identified, accounting for 90.32% of the codons, indicating that frequently used codons tend to end with the A/T base.

Conclusions
In this study, the mt genome of S. inversa was sequenced, assembled, and annotated, and the DNA and amino acid sequences of the annotated genes were analyzed.The mt genome of S. inversa is circular and has a total length of 335,372 bp.A total of 62 genes were annotated in the mt genome, including 33 protein-coding genes, 22 tRNA genes, and 6 rRNA genes.We analyzed repeat sequences, RNA editing processes, and codon preferences in the mt genome of S. inversa.We observed gene transfer between the mt and cp genomes of S. inversa by examining homologous fragments.In addition, our results showed that although the size of plant mt genomes varies greatly, their GC content is relatively conserved during evolution, suggesting that the mt gene is conserved during evolution.This study provides extensive information about the mt genome of S. inversa.Importantly, this study lays the foundation for future studies on the genetic variation, phylogeny, and plateau adaptation of S. inversa using the mt genome.

Figure 2 .
Figure 2. Mt genome map of S. inversa.The forward coding gene is on the outside of the circle, and the reverse coding gene is inside the circle.The colored parabola in the center of the circle represents the dispersed repeats.

Figure 2 .
Figure 2. Mt genome map of S. inversa.The forward coding gene is on the outside of the circle, and the reverse coding gene is inside the circle.The colored parabola in the center of the circle represents the dispersed repeats.

Figure 3 .
Figure 3.The number of RNA editing sites.

Figure 3 .
Figure 3.The number of RNA editing sites.

Figure 3 .
Figure 3.The number of RNA editing sites.

Figure 5 .
Figure 5.The interspersed repeat sequences in the S. inversa mt genome.(A) The four intersperse repeat types are distributed throughout the genome; the two black lines represent the mt genom and the same repeats are associated with the line segments.(B) Distribution of the lengths of th interspersed repeats in the mt genome.The abscissa indicates the type of interspersed repeat, an the ordinate indicates the number of scattered repeats.F for Forward, P for Palindromic, R for Re verse, and C for Complement.

Figure 5 .
Figure 5.The interspersed repeat sequences in the S. inversa mt genome.(A) The four interspersed repeat types are distributed throughout the genome; the two black lines represent the mt genome, and the same repeats are associated with the line segments.(B) Distribution of the lengths of the interspersed repeats in the mt genome.The abscissa indicates the type of interspersed repeat, and the ordinate indicates the number of scattered repeats.F for Forward, P for Palindromic, R for Reverse, and C for Complement.

Figure 6 .
Figure 6.RSCU in the S. inversa mt genome.The different amino acids are shown on the x-axis.RSCU values are the number of times a particular codon was observed relative to the number of times that codon would be expected for uniform synonymous codon usage.

Figure 6 .
Figure 6.RSCU in the S. inversa mt genome.The different amino acids are shown on the x-axis.RSCU values are the number of times a particular codon was observed relative to the number of times that codon would be expected for uniform synonymous codon usage.

Figure 7 .
Figure 7.Comparison of a homologous fragment in the S. inversa cp genome (GenBank accession number: MT554929) to that in the mt genome.The green line segment of the circle represents the S. inversa cp genome, and the red line segment represents the S. inversa mt genome.Genes from the same complex are labelled with the same colours.The green line segment in the circle connects the start and end points of the transferred gene fragments.The width of the green line segment represents the size of the transferred fragment.

Figure 7 .
Figure 7.Comparison of a homologous fragment in the S. inversa cp genome (GenBank accession number: MT554929) to that in the mt genome.The green line segment of the circle represents the S. inversa cp genome, and the red line segment represents the S. inversa mt genome.Genes from the same complex are labelled with the same colours.The green line segment in the circle connects the start and end points of the transferred gene fragments.The width of the green line segment represents the size of the transferred fragment.

Table 1 .
Basic information of the S. inversa mt genome genes.

Table 1 .
Basic information of the S. inversa mt genome genes.

Table 2 .
Prediction of RNA editing sites.

Table 3 .
Comparison of a homologous fragment in the S. inversa cp genome to that in the mt genome.

Table 3 .
Comparison of a homologous fragment in the S. inversa cp genome to that in the mt genome.